target signal
Supplementary Information 10 Relation between low pass filter and
Eqn. 3 represents the solution for a stationary energy with respect to the prospective voltage ŭ Here we consider a generalization of the energy function from the main manuscript that includes arbitrary "connectivity functions" f with parameters θ: E(ŭ Pseudo-code for our vanilla implementation can be found in Algorithm 1. Algorithm 1 Pseudo-code for the multi-layer implementation of Latent Equilibrium (LE) Figure 5: Learning to mimic a teacher microcircuit with LE. For the interneurons, the somatic membrane potentials of the pyramidal neurons in the layer above serve as targets. First, the output rate of the neurons must depend on the prospective voltage: ϕ (u) ϕ (ŭ). Note that this includes also the rates in the calculation of dendritic membrane potentials (Eqns. Learning is split into two stages: first, the learning of the so-called self-predicting state and afterwards the learning of the actual task.
Improving Accuracy and Efficiency of Implicit Neural Representations: Making SIREN a WINNER
Chandravamsi, Hemanth, Shenoy, Dhanush V., Frankel, Steven H.
We identify and address a fundamental limitation of sinusoidal representation networks (SIRENs), a class of implicit neural representations. SIRENs Sitzmann et al. (2020), when not initialized appropriately, can struggle at fitting signals that fall outside their frequency support. In extreme cases, when the network's frequency support misaligns with the target spectrum, a 'spectral bottleneck' phenomenon is observed, where the model yields to a near-zero output and fails to recover even the frequency components that are within its representational capacity. To overcome this, we propose WINNER - Weight Initialization with Noise for Neural Representations. WINNER perturbs uniformly initialized weights of base SIREN with Gaussian noise - whose noise scales are adaptively determined by the spectral centroid of the target signal. Similar to random Fourier embeddings, this mitigates 'spectral bias' but without introducing additional trainable parameters. Our method achieves state-of-the-art audio fitting and significant gains in image and 3D shape fitting tasks over base SIREN. Beyond signal fitting, WINNER suggests new avenues in adaptive, target-aware initialization strategies for optimizing deep neural network training. For code and data visit cfdlabtechnion.github.io/siren_square/.
Supplementary Information 10 Relation between low-pass filter and lookahead In general, the prospective (or lookahead) voltage u
Eqn. 3 represents the solution for a stationary energy with respect to the prospective voltage To include synaptic filtering in our theory, we introduce an additional LPF as in Eqn. 10 with time The target signal for the top-layer pyramidal is determined by the training set. We include LE in the dendritic microcircuit by two simple modifications. Learning is split into two stages: first, the learning of the so-called self-predicting state and afterwards the learning of the actual task. The full set of parameters used in Figure 1 and Figure 1 can be found in Section 15.2 . Table 1 lists all the parameters we used for the experiments shown in Figure 1 .
A New Perspective To Understanding Multi-resolution Hash Encoding For Neural Fields
Instant-NGP has been the state-of-the-art architecture of neural fields in recent years. Its incredible signal-fitting capabilities are generally attributed to its multi-resolution hash grid structure and have been used and improved in numerous following works. However, it is unclear how and why such a hash grid structure improves the capabilities of a neural network by such great margins. A lack of principled understanding of the hash grid also implies that the large set of hyperparameters accompanying Instant-NGP could only be tuned empirically without much heuristics. To provide an intuitive explanation of the working principle of the hash grid, we propose a novel perspective, namely domain manipulation. This perspective provides a ground-up explanation of how the feature grid learns the target signal and increases the expressivity of the neural field by artificially creating multiples of pre-existing linear segments. We conducted numerous experiments on carefully constructed 1-dimensional signals to support our claims empirically and aid our illustrations. While our analysis mainly focuses on 1-dimensional signals, we show that the idea is generalizable to higher dimensions.
Quantum Implicit Neural Compression
Fujihashi, Takuya, Koike-Akino, Toshiaki
Signal compression based on implicit neural representation (INR) is an emerging technique to represent multimedia signals with a small number of bits. While INR-based signal compression achieves high-quality reconstruction for relatively low-resolution signals, the accuracy of high-frequency details is significantly degraded with a small model. To improve the compression efficiency of INR, we introduce quantum INR (quINR), which leverages the exponentially rich expressivity of quantum neural networks for data compression. Evaluations using some benchmark datasets show that the proposed quINR-based compression could improve rate-distortion performance in image compression compared with traditional codecs and classic INR-based coding methods, up to 1.2dB gain.
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
Yang, Hongru, Kailkhura, Bhavya, Wang, Zhangyang, Liang, Yingbin
Understanding the training dynamics of transformers is important to explain the impressive capabilities behind large language models. In this work, we study the dynamics of training a shallow transformer on a task of recognizing co-occurrence of two designated words. In the literature of studying training dynamics of transformers, several simplifications are commonly adopted such as weight reparameterization, attention linearization, special initialization, and lazy regime. In contrast, we analyze the gradient flow dynamics of simultaneously training three attention matrices and a linear MLP layer from random initialization, and provide a framework of analyzing such dynamics via a coupled dynamical system. We establish near minimum loss and characterize the attention model after training. We discover that gradient flow serves as an inherent mechanism that naturally divide the training process into two phases. In Phase 1, the linear MLP quickly aligns with the two target signals for correct classification, whereas the softmax attention remains almost unchanged. In Phase 2, the attention matrices and the MLP evolve jointly to enlarge the classification margin and reduce the loss to a near minimum value. Technically, we prove a novel property of the gradient flow, termed \textit{automatic balancing of gradients}, which enables the loss values of different samples to decrease almost at the same rate and further facilitates the proof of near minimum training loss. We also conduct experiments to verify our theoretical results.
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
Kania, Adam, Mihajlovic, Marko, Prokudin, Sergey, Tabor, Jacek, Spurek, Przemysław
Implicit Neural Representations (INRs) have recently gained attention as a powerful approach for continuously representing signals such as images, videos, and 3D shapes using multilayer perceptrons (MLPs). However, MLPs are known to exhibit a low-frequency bias, limiting their ability to capture high-frequency details accurately. This limitation is typically addressed by incorporating high-frequency input embeddings or specialized activation layers. In this work, we demonstrate that these embeddings and activations are often configured with hyperparameters that perform well on average but are suboptimal for specific input signals under consideration, necessitating a costly grid search to identify optimal settings. Our key observation is that the initial frequency spectrum of an untrained model's output correlates strongly with the model's eventual performance on a given target signal. Leveraging this insight, we propose frequency shifting (or FreSh), a method that selects embedding hyperparameters to align the frequency spectrum of the model's initial output with that of the target signal. We show that this simple initialization technique improves performance across various neural representation methods and tasks, achieving results comparable to extensive hyperparameter sweeps but with only marginal computational overhead compared to training a single model with default hyperparameters. Implicit Neural Representations (INRs) are advancing computer graphics research by integrating classical algorithms with continuous signal representations. They have been successfully applied in signal representation and inverse problems, with notable applications in neural rendering, compression, and 2D and 3D signal reconstruction (Xie et al., 2022). INRs primarily rely on multilayer perceptrons (MLPs), making them susceptible to spectral bias, which refers to the slower convergence of MLPs when approximating high-frequency components of the target signal (Rahaman et al., 2019).
Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues
We propose a multichannel-to-multichannel target sound extraction (M2M-TSE) framework for separating multichannel target signals from a multichannel mixture of sound sources. Target sound extraction (TSE) isolates a specific target signal using user-provided clues, typically focusing on single-channel extraction with class labels or temporal activation maps. However, to preserve and utilize spatial information in multichannel audio signals, it is essential to extract multichannel signals of a target sound source. Moreover, the clue for extraction can also include spatial or temporal cues like direction-of-arrival (DoA) or timestamps of source activation. To address these challenges, we present an M2M framework that extracts a multichannel sound signal based on spatio-temporal clues. We demonstrate that our transformer-based architecture can successively accomplish the M2M-TSE task for multichannel signals synthesized from audio signals of diverse classes in different room environments. Furthermore, we show that the multichannel extraction task introduces sufficient inductive bias in the DNN, allowing it to directly handle DoA clues without utilizing hand-crafted spatial features.